首页> 外文OA文献 >Real-Time Bidding by Reinforcement Learning in Display Advertising
【2h】

Real-Time Bidding by Reinforcement Learning in Display Advertising

机译:在展示广告中通过强化学习进行实时招标

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The majority of online display ads are served through real-time bidding (RTB)--- each ad display impression is auctioned off in real-time when it is justbeing generated from a user visit. To place an ad automatically and optimally,it is critical for advertisers to devise a learning algorithm to cleverly bidan ad impression in real-time. Most previous works consider the bid decision asa static optimization problem of either treating the value of each impressionindependently or setting a bid price to each segment of ad volume. However, thebidding for a given ad campaign would repeatedly happen during its life spanbefore the budget runs out. As such, each bid is strategically correlated bythe constrained budget and the overall effectiveness of the campaign (e.g., therewards from generated clicks), which is only observed after the campaign hascompleted. Thus, it is of great interest to devise an optimal bidding strategysequentially so that the campaign budget can be dynamically allocated acrossall the available impressions on the basis of both the immediate and futurerewards. In this paper, we formulate the bid decision process as areinforcement learning problem, where the state space is represented by theauction information and the campaign's real-time parameters, while an action isthe bid price to set. By modeling the state transition via auction competition,we build a Markov Decision Process framework for learning the optimal biddingpolicy to optimize the advertising performance in the dynamic real-time biddingenvironment. Furthermore, the scalability problem from the large real-worldauction volume and campaign budget is well handled by state value approximationusing neural networks.
机译:大多数在线展示广告都是通过实时出价(RTB)进行投放的-每个广告展示印象只是在用户访问时就被实时拍卖。为了自动且最佳地放置广告,对于广告客户而言,设计一种学习算法以实时巧妙地提高广告印象至关重要。以前的大多数工作都将出价决策视为一个静态的优化问题,它要么独立处理每个印象的价值,要么为广告量的每个细分市场设置出价。但是,在预算用尽之前,针对给定广告系列的出价会在其生命周期内反复发生。这样,每个出价都与预算受限和广告系列的整体效果(例如,由此产生的点击数)在策略上相关联,只有在广告系列完成后才能观察到。因此,非常有意义的是顺序地设计最佳出价策略,以便可以基于即时和未来奖励在所有可用印象中动态分配广告系列预算。在本文中,我们将投标决策过程表述为强化学习问题,其中状态空间由拍卖信息和活动的实时参数表示,而动作是要设置的投标价格。通过对拍卖竞争下的状态转换进行建模,我们建立了一个马尔可夫决策过程框架,用于学习最优出价策略,以优化动态实时出价环境中的广告效果。此外,通过使用神经网络进行状态值逼近,可以很好地处理来自大量实际交易和活动预算的可伸缩性问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号